99 research outputs found
Guessers for Finite-State Transducer Lexicons
Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words. In order to add new words to a lexicon, we need to indicate their inflectional paradigm. We present a new generally applicable method for creating an entry generator, i.e. a paradigm guesser, for finite-state transducer lexicons. As a guesser tends to produce numerous suggestions, it is important that the correct suggestions be among the first few candidates. We prove some formal properties of the method and evaluate it on Finnish, English and Swedish full-scale transducer lexicons. We use the open-source Helsinki Finite-State Technology to create finitestate transducer lexicons from existing lexical resources and automatically derive guessers for unknown words. The method has a recall of 82-87 % and a precision of 71-76 % for the three test languages. The model needs no external corpus and can therefore serve as a baseline.Peer reviewe
Language and Dialect Identification of Cuneiform Texts
This article introduces a corpus of cuneiform texts from which the dataset
for the use of the Cuneiform Language Identification (CLI) 2019 shared task was
derived as well as some preliminary language identification experiments
conducted using that corpus. We also describe the CLI dataset and how it was
derived from the corpus. In addition, we provide some baseline language
identification results using the CLI dataset. To the best of our knowledge, the
experiments detailed here are the first time automatic language identification
methods have been used on cuneiform data
Improving Word Association Measures in Repetitive Corpora with Context Similarity Weighting
Peer reviewe
Weighted Finite-State Morphological Analysis of Finnish Compounding with HFST-LEXC
Proceedings of the 17th Nordic Conference of Computational Linguistics
NODALIDA 2009.
Editors: Kristiina Jokinen and Eckhard Bick.
NEALT Proceedings Series, Vol. 4 (2009), 89-95.
© 2009 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/9206
FIN-CLARIN â en humanistisk forskningsinfrastruktur med betoning pĂ„ sprĂ„k
Miljardvis med ord och tusentals timmar med audio och video behövs som material för humanistisk forskning och i synnerhet sprÄkforskning. Dessutom behöver forskarna redskap för att förÀdla och jÀmföra sina egna datasamlingar med allmÀnna datasamlingar. NÀr ett forskningsprojekt Àr slut behövs det lagrings- och spridningsplatser för att göra rÄdata, redskap och forskningsresultat tillgÀngliga och anvÀndbara. Data, redskap och gemensamma anvÀndningsmöjligheter bildar tillsammans en forskningsinfrastruktur, som gör det möjligt att verifiera tidigare resultat och effektivare göra nya rön, nÀr alla inte behöver starta frÄn noll med att samla data och bygga analysredskap
Laundry Symbols and License Management : Practical Considerations for the Distribution of LRs based on experiences from CLARIN
One of the most challenging tasks in building language resources is the copyright license management. There are several reasons for this. First of all, the current European copyright system is designed to a large extent to satisfy the commercial actors, e.g. publishers, record companies etc. This means that the scope and duration of the rights are very extensive and there are even certain forms of protection that do not exist elsewhere in the world, e.g. database right. On the other hand, the exceptions for research and teaching are typically very narrow.Vertaisarvioitu/peerReviewe
Italian Language and Dialect Identification and Regional French Variety Detection using Adaptive Naive Bayes
Peer reviewe
The CLARIN Committee for Legal and Ethical Issues and the Normative Layer of the CLARIN Infrastructure : Ville Oksanen, in memoriam (26 december 1976-23 november 2014)
Publisher Copyright: © 2022 Darja Fiƥer and Andreas Witt, published by Walter deGruyter GmbH, Berlin/Boston. All rights reserved.Peer reviewe
- âŠ